An Evaluation of Metadata and Data Quality on Person-Level, Aggregated, Thesauri, Statistical Classifications, and Rectangular Data Sets

نویسندگان

  • Thomas Bosch
  • Benjamin Zapilko
  • Joachim Wackerow
  • Kai Eckert
چکیده

From 2012 to 2015 together with other Linked Data community members and experts from the social, behavioural, and economic sciences (SBE ), we developed diverse vocabularies to represent SBE metadata and rectangular data in RDF. The DDI-RDF Discovery Vocabulary (Disco) is designed to support the dissemination, management, and reuse of person-level data, i.e., data about individuals, households, and businesses, collected in form of responses to studies and archived for research purposes. The RDF Data Cube Vocabulary (Data Cube) is a W3C recommendation for expressing data cubes, i.e. multi-dimensional aggregate data. Physical Data Description (PHDD) is a vocabulary to model data in rectangular format. The data could either be represented in records with character-separated values (CSV ) or fixed length. The Simple Knowledge Organization System (SKOS) is a vocabulary to build knowledge organization systems such as thesauri, classification schemes, and taxonomies. XKOS is a SKOS extension to describe formal statistical classifications. To ensure high quality of and trust in both metadata and data, their representation in RDF must satisfy certain criteria specified in terms of RDF constraints. In this paper, we evaluated the metadata and data quality of large real world aggregated (QB), person-level (Disco), thesauri (SKOS), rectangular (PHDD), and statistical classification (XKOS) data sets by means of RDF constraints. RDF Constraints are instances of RDF constraint types either corresponding to RDF validation requirements or to data model specific constraint types. We validated more than 4.2 billion triples and 15 thousand data sets using the RDF Validator, a validation environment which is available at http://purl.org/net/rdfval-demo.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data and Methods for the Production of National Population Estimates: An Overview and Analysis of Available Metadata

Thomas Spoorenberg Translated by: Elham Fathi Statistical Center of Iran Abstract. Official population estimates can be produced using a variety of data sources and methods. These range from the direct extraction of information from continuously updated population registers to procedures for updating the status of a population enumerated previously in a periodic census. Additional sources and ...

متن کامل

بررسی مقایسه‎ای روابط معنایی، ساختار شکلی و سیستم مدیریت اصطلاحنامه‎های فنی ـ مهندسی و نما

Purpose: Thesauri as important tools in storage and retrieval information systems have a significant role in the optimization of database search. So the publishing of thesauri needs to use standards as much as possible. I examined and compared two important thesauruses on the basis of ANSI/NISO z39.19 2005. Methodology: This study is an analytical and applied survey. The study population was t...

متن کامل

A Comprehensive Method of Evaluating Open Government Data with the Aim of Improving Data Quality and Increasing Citizens' Willingness

Purpose: The purpose is to present an open government data evaluation method by considering comprehensive and complete dimensions and criteria - calculating the weight and importance of each criterion, examining the country in this area, clustering organizations and presenting a classification model to predict the situation. Methodology: Library studies was used to extract the dimensions and cr...

متن کامل

Evaluation of the nutritional effects of fasting on cardiovascular diseases, using fuzzy data mining

Background: Advances in information technology and data collection methods have enabled high-speed collection and storage of huge amounts of data. Data mining can be used to derive laws from large data volumes and their characteristics. Similarly, fuzzy logic by facilitating the understanding of events is considered a suitable complement to scientific data mining. Materials and Methods: The pre...

متن کامل

Large amplitude vibration prediction of rectangular plates by an optimal artificial neural network (ANN)

In this paper, nonlinear equations of motion for laminated composite rectangular plates based on the first order shear deformation theory were derived. Using a perturbation method, the nonlinear equation of motion was solved and analytical relations were obtained for natural and nonlinear frequencies. After proving the validity of the obtained analytical relations, as an alternative and simple ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1504.04478  شماره 

صفحات  -

تاریخ انتشار 2015